Skip to content

Unify process termination on POSIX & Windows (+ tests) #1044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

felixweinberger
Copy link
Contributor

@felixweinberger felixweinberger commented Jun 26, 2025

Motivation and Context

#555, #559, #765, #729, and #850 address important but different ways MCP servers can hang or fail to successfully terminate on Windows & POSIX systems.

This PR adds regressions tests to clearly demonstrate the edge cases addressed + unifies the approaches from the PRs listed above.

How Has This Been Tested?

New regression tests added (each fails without the relevant fix). These were developed and tested on both POSIX and Windows systems.

For #555 and #559 (guard for MCP servers that don't react to SIGTERM):

For #765 (align shutdown sequence with MCP spec):

  • Original author: @davenpi
  • See commit 2 (implementation) & 4 (refactor only)

For #729 and #850 (properly terminate processes and their children across platforms):

Breaking Changes

None.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 2 times, most recently from 9f88ea7 to a27fcef Compare June 26, 2025 21:31
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 6 times, most recently from caa5628 to 1d495ca Compare June 30, 2025 14:55
@felixweinberger felixweinberger changed the title Add regression tests for #559 and #555 Add regression tests for #559, #555, #765 Jun 30, 2025
@felixweinberger felixweinberger changed the title Add regression tests for #559, #555, #765 Add regression tests for #555, #559, #765 Jun 30, 2025
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 9e8d74b to 3f6c472 Compare June 30, 2025 15:39
@felixweinberger felixweinberger changed the title Add regression tests for #555, #559, #765 [WIP] Add regression tests for #555, #559, #765, #729, #850 Jul 1, 2025
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch 15 times, most recently from cbaee2e to e8954dd Compare July 1, 2025 17:45
@felixweinberger felixweinberger changed the title [WIP] Add regression tests for #555, #559, #765, #729, #850 Unify process termination on POSIX & Windows and add comprehensive regression testing Jul 1, 2025
@felixweinberger felixweinberger changed the title Unify process termination on POSIX & Windows and add comprehensive regression testing Unify process termination on POSIX & Windows (+ tests) Jul 1, 2025
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from e8954dd to 90f463e Compare July 1, 2025 18:38
felixweinberger and others added 4 commits July 1, 2025 19:40
The stdio cleanup was hanging indefinitely when processes ignored
termination signals or took too long to exit. This caused the MCP
client to freeze during shutdown, especially with servers that don't
handle SIGTERM properly.

The fix introduces a 2-second timeout for process termination. If a
process doesn't exit gracefully within this window, it's forcefully
killed. This ensures the client always completes cleanup in bounded
time while still giving well-behaved servers a chance to exit cleanly.

This resolves hanging issues reported when MCP servers ignore standard
termination signals or perform lengthy cleanup operations.

Also adds regression tests for #559.

resolves #555

Co-authored-by: Cristian Pufu <[email protected]>
The MCP specification recommends closing stdin first to allow servers
to exit gracefully before resorting to signals. This approach gives
well-behaved servers the opportunity to detect stdin closure and
perform clean shutdown without forceful termination.

The shutdown sequence now follows a graceful escalation path: first
closing stdin and waiting 2 seconds for voluntary exit, then sending
SIGTERM if needed, and finally using SIGKILL as a last resort. This
minimizes the risk of data loss or corruption while ensuring cleanup
always completes.

This unified approach works consistently across all platforms and
improves compatibility with MCP servers that monitor stdin for
lifecycle management.

resolves #765

Co-authored-by: davenpi <[email protected]>
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
Refactor the MCP spec-compliant stdio shutdown sequence into a separate
_shutdown_process function to document that this is an MCP specific
shutdown sequence. Logic remains unchanged.

Co-authored-by: davenpi <[email protected]>
@felixweinberger felixweinberger force-pushed the fweinberger/stream-cleanup-only-approach branch from 90f463e to 50559f7 Compare July 1, 2025 18:46
@felixweinberger felixweinberger marked this pull request as ready for review July 1, 2025 18:57
@felixweinberger felixweinberger requested a review from dsp-ant July 1, 2025 19:06
felixweinberger added a commit that referenced this pull request Jul 1, 2025
The test now demonstrates that:
1. The cleanup issue affects all platforms, not just Windows
2. Closing stdin and waiting for graceful exit allows cleanup to run

This confirms that PR #1044's approach of implementing the MCP spec's
stdio shutdown sequence (close stdin first, then wait, then terminate)
could address issue #1027.

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
The test now demonstrates that:
1. The cleanup issue affects all platforms, not just Windows
2. Closing stdin and waiting for graceful exit allows cleanup to run

This confirms that PR #1044's approach of implementing the MCP spec's
stdio shutdown sequence (close stdin first, then wait, then terminate)
could address issue #1027.

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
The test_1027_win_unreachable_cleanup.py file now contains all necessary
tests to demonstrate issue #1027 and verify PR #1044 fixes it.
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
felixweinberger added a commit that referenced this pull request Jul 1, 2025
This test shows that MCP server cleanup code in lifespan doesn't run
when the process is terminated, but does run when stdin is closed first
(as implemented in PR #1044).

The test includes:
- Demonstration of current broken behavior (cleanup doesn't run)
- Verification that stdin closure allows graceful shutdown
- Windows-specific ResourceWarning handling
- Detailed documentation of the issue and solution

Github-Issue:#1027
@jingx8885
Copy link

jingx8885 commented Jul 2, 2025

Verification of Windows Process Tree Termination Fix
(Originally encountered in #729 context)

Scenario:
In a Python MCP server (uv-pyproject structure) using stdio mode launched via:
uv run --env-file=xxx mcp_server.py

We observed the following process hierarchy:

Python parent process  
  → Python subprocess (middle layer)  
    → uv.exe  
      → MCP stdio service  

Failure:
Terminating the middle Python subprocess failed to terminate its descendants, leaving:

  1. uv.exe process
  2. MCP stdio service process

...as orphaned/zombie processes on Windows.

Validation:
➡️ This specific failure mode was resolved by the process termination logic **originally implemented in #729 **.
➡️ The equivalent solution now exists in #1044's unified approach.
➡️ This logic has been production-validated on Windows 11 across:

  • Live production systems
  • Daily development/testing cycles
  • High-frequency restart scenarios

Conclusion:
#1044 successfully addresses this Windows-specific process tree termination flaw. The child process cleanup behavior required for this scenario is correctly implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants